[Data] Add UnresolvedExpr placeholder for future schema resolution#60794
[Data] Add UnresolvedExpr placeholder for future schema resolution#60794slfan1989 wants to merge 3 commits intoray-project:masterfrom
Conversation
This commit introduces UnresolvedExpr as a first-class expression type to represent column references that have not yet been resolved against a concrete schema. Changes: - Add UnresolvedExpr class with name property and no defined data_type - Update all expression visitors to handle UnresolvedExpr: * PyArrow/Iceberg visitors: fail with clear error messages * NativeExpressionEvaluator: reject evaluation with TypeError * ColumnReferenceCollector: collect unresolved names * ColumnSubstitutionVisitor: support substitution * Tree/inline repr visitors: display as UNRESOLVED(name) - Change StarExpr.data_type from DataType(object) to None for consistency - Change Expr.data_type type annotation to DataType | None - Add comprehensive unit tests for UnresolvedExpr - Update API documentation to include UnresolvedExpr Rationale: This prepares the expression system for schema-aware resolution while ensuring unresolved expressions cannot be accidentally evaluated or converted. Both StarExpr and UnresolvedExpr now share the same semantic: placeholders without concrete types. Note: This PR does not implement resolution logic. The resolver will be added in a follow-up PR. Signed-off-by: slfan1989 <slfan1989@apache.org>
There was a problem hiding this comment.
Code Review
This pull request introduces UnresolvedExpr as a placeholder for column references that have not yet been resolved. This is a solid foundational change that will enable future schema-aware resolution workflows. The changes are consistently applied across the expression system, including all relevant visitors and the base Expr class. The modification of Expr.data_type to DataType | None is a good move for semantic consistency. The accompanying unit tests are thorough and cover the new functionality well. I have one minor suggestion to improve a docstring for clarity. Overall, this is an excellent contribution.
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Show resolved
Hide resolved
…valuator.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: slfan1989 <55643692+slfan1989@users.noreply.github.com>
This commit introduces UnresolvedExpr as a first-class expression type to represent column references that have not yet been resolved against a concrete schema. Changes: - Add UnresolvedExpr class with name property and no defined data_type - Update all expression visitors to handle UnresolvedExpr: * PyArrow/Iceberg visitors: fail with clear error messages * NativeExpressionEvaluator: reject evaluation with TypeError * ColumnReferenceCollector: collect unresolved names * ColumnSubstitutionVisitor: support substitution * Tree/inline repr visitors: display as UNRESOLVED(name) - Change StarExpr.data_type from DataType(object) to None for consistency - Change Expr.data_type type annotation to DataType | None - Add comprehensive unit tests for UnresolvedExpr - Update API documentation to include UnresolvedExpr Rationale: This prepares the expression system for schema-aware resolution while ensuring unresolved expressions cannot be accidentally evaluated or converted. Both StarExpr and UnresolvedExpr now share the same semantic: placeholders without concrete types. Note: This PR does not implement resolution logic. The resolver will be added in a follow-up PR. Signed-off-by: slfan1989 <slfan1989@apache.org>
|
Hi @slfan1989!, thanks for the contribution. Right now, we are currently revamping the Expressions to support Unresolved vs Resolved Expressions. The team right now is currently aligning on the exact design, and in case ur curious, here is the prototype #59117. Since this change and subsequent changes need to be very foundational and extendable, i'm going to hold off reviewing this PR until we can be sure this is the right approach. You're welcome to be a part of that discussion too through our public slack channel |
@iamjustinhsu Thank you for the thoughtful response and for the update on the ongoing revamp! I’m really excited about the direction of supporting I’ll check out prototype #59117 to better understand the planned changes. In the meantime, if there are any parts of my current implementation that align with (or conflict with) the new design, I’d really appreciate any early feedback so I can refactor accordingly. Also, I’d be happy to join the discussion in the public Slack channel to learn more and see if there’s any way I can help contribute. Thanks again for your time and for keeping the project moving forward! |
Description
This PR introduces
UnresolvedExpras a new expression type to represent column references that have not yet been resolved against a concrete schema. This is a foundational change that prepares the expression system for future schema-aware resolution workflows.Key changes:
UnresolvedExprclass withnameproperty anddata_type: NoneUnresolvedExpr(PyArrow, Iceberg, evaluator, collectors, repr visitors)StarExpr.data_typeandExpr.data_typetoDataType | Nonefor semantic consistencyWhy this is needed:
Currently, the expression system cannot represent column references that haven't been resolved against a schema. This is needed for string-based expression parsing, lazy schema evaluation, and deferred type checking scenarios.
UnresolvedExprserves as an explicit placeholder that fails fast when accidentally evaluated or converted, preventing subtle bugs.Note: This PR does not implement resolution logic.
UnresolvedExprcurrently cannot be evaluated or converted—it only serves as a placeholder. The resolver will be added in a follow-up PR.Related issues
Additional information
Implementation Details
New Expression Type:
UnresolvedExpr(name: str)- immutable dataclass with no defineddata_typeUNRESOLVED('name')(tree) /unresolved('name')(inline)Visitor Updates:
All expression visitors now handle
UnresolvedExpr:_PyArrowExpressionVisitor,_IcebergExpressionVisitor): RaiseTypeErrorwith clear error messagesNativeExpressionEvaluator): Rejects evaluation with helpful error_ColumnReferenceCollector): Collects unresolved names like regular column references_ColumnSubstitutionVisitor): Supports substituting unresolved expressionsType System Alignment:
Both
StarExprandUnresolvedExprnow usedata_type: Noneinstead ofDataType(object), making it explicit that these are placeholders without concrete types.Follow-up Work Plan
The resolution mechanism will be implemented in subsequent PRs: